Bilexical Embeddings for Quality Estimation
نویسندگان
چکیده
This paper describes the SHEF submissions for the three sub-tasks of the Quality Estimation shared task of WMT17, namely: (i) a word-level prediction system using bilexical embeddings, (ii) a phrase-level labelling approach based on the word-level predictions, (iii) a sentencelevel prediction system using word embeddings and handcrafted baseline features. Results are promising for the sentence-level approach, but still very preliminary for the other two levels.
منابع مشابه
Tailoring Word Embeddings for Bilexical Predictions: An Experimental Comparison
We investigate the problem of inducing word embeddings that are tailored for a particular bilexical relation. Our learning algorithm takes an existing lexical vector space and compresses it such that the resulting word embeddings are good predictors for a target bilexical relation. In experiments we show that task-specific embeddings can benefit both the quality and efficiency in lexical predic...
متن کاملLearning Task-specific Bilexical Embeddings
We present a method that learns bilexical operators over distributional representations of words and leverages supervised data for a linguistic relation. The learning algorithm exploits lowrank bilinear forms and induces low-dimensional embeddings of the lexical space tailored for the target linguistic relation. An advantage of imposing low-rank constraints is that prediction is expressed as th...
متن کاملExpressive Power and Consistency Properties of State-of-the-Art Natural Language Parsers
We define Probabilistic Constrained W-grammars (PCWgrammars), a two-level formalism capable of capturing grammatical frameworks used in two state of the art parsers, namely bilexical grammars and stochastic tree substitution grammars. We provide embeddings of these parser formalisms into PCW-grammars, which allows us to derive properties about their expressive power and consistency, and relatio...
متن کاملWord embeddings and discourse information for Quality Estimation
In this paper we present the results of the University of Sheffield (SHEF) submissions for the WMT16 shared task on document-level Quality Estimation (Task 3). Our submission explore discourse and document-aware information and word embeddings as features, with Support Vector Regression and Gaussian Process used to train the Quality Estimation models. The use of word embeddings (combined with b...
متن کاملUsing Self-Trained Bilexical Preferences to Improve Disambiguation Accuracy
A method is described to incorporate bilexical preferences between phrase heads, such as selection restrictions, in a MaximumEntropy parser for Dutch. The bilexical preferences are modelled as association rates which are determined on the basis of a very large parsed corpus (about 500M words). We show that the incorporation of such selftrained preferences improves parsing accuracy significantly.
متن کامل